Search CORE

85 research outputs found

Enumerating the k closest pairs mechanically

Author: Lenhof H.
Smid M.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1992
Field of study

Let

S

be a set of

n

points in

D

-dimensional space, where

D

is a constant, and let

k

be an integer between

1

and

n \choose 2

. An algorithm is given that computes the

k

closest pairs in the set

S

O(n \log n + k)

time, using

O(n+k)

space. The algorithm fits in the algebraic decision tree model and is, therefore, optimal

MPG.PuRe

Computational Molecular Biology

Author: Lenhof H.
Mutzel P.
Vingron M.
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/1996
Field of study

Computational Biology is a fairly new subject that arose in response to the computational problems posed by the analysis and the processing of biomolecular sequence and structure data. The field was initiated in the late 60's and early 70's largely by pioneers working in the life sciences. Physicists and mathematicians entered the field in the 70's and 80's, while Computer Science became involved with the new biological problems in the late 1980's. Computational problems have gained further importance in molecular biology through the various genome projects which produce enormous amounts of data. For this bibliography we focus on those areas of computational molecular biology that involve discrete algorithms or discrete optimization. We thus neglect several other areas of computational molecular biology, like most of the literature on the protein folding problem, as well as databases for molecular and genetic data, and genetic mapping algorithms. Due to the availability of review papers and a bibliography this bibliography

MPG.PuRe

{EDISON}-{WMW}: Exact Dynamic Programing Solution of the {Wilcoxon}-{Mann}-{Whitney} Test

Author: Backes C.
Keller A.
Lenhof H.
Marx A.
Meese E.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

In many research disciplines, hypothesis tests are applied to evaluate whether findings are statistically significant or could be explained by chance. The Wilcoxon–Mann–Whitney (WMW) test is among the most popular hypothesis tests in medicine and life science to analyze if two groups of samples are equally distributed. This nonparametric statistical homogeneity test is commonly applied in molecular diagnosis. Generally, the solution of the WMW test takes a high combinatorial effort for large sample cohorts containing a significant number of ties. Hence, P value is frequently approximated by a normal distribution. We developed EDISON-WMW, a new approach to calculate the exact permutation of the two-tailed unpaired WMW test without any corrections required and allowing for ties. The method relies on dynamic programing to solve the combinatorial problem of the WMW test efficiently. Beyond a straightforward implementation of the algorithm, we presented different optimization strategies and developed a parallel solution. Using our program, the exact P value for large cohorts containing more than 1000 samples with ties can be calculated within minutes. We demonstrate the performance of this novel approach on randomly-generated data, benchmark it against 13 other commonly-applied approaches and moreover evaluate molecular biomarkers for lung carcinoma and chronic obstructive pulmonary disease (COPD). We found that approximated P values were generally higher than the exact solution provided by EDISON-WMW. Importantly, the algorithm can also be applied to high-throughput omics datasets, where hundreds or thousands of features are included. To provide easy access to the multi-threaded version of EDISON-WMW, a web-based solution of our algorithm is freely available at http://www.ccb.uni-saarland.de/software/wtest/

Elsevier - Publisher Connector

Directory of Open Access Journals

PubMed Central

MPG.PuRe

{RegulatorTrail}: {A} Web Service for the Identification of Key Transcriptional Regulators

Author: Backes C.
Gerstner N.
Kehl T.
Keller A.
Lenhof H.
Meese E.
Schmidt F.
Schneider L.
Schulz M.
Stöckel D.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2017
Field of study

MPG.PuRe

Algorithm engineering for optimal alignment of protein structure distance matrices

Author: A. Andreeva
A. Caprara
A. Marin
A. Schrijver
C. Berbalk
D. Wu
D.A. Pelta
E. Althaus
G. Mayr
Gunnar W. Klau
H. Hasegawa
H.P. Lenhof
I. Wohlers
Inken Wohlers
L. Holm
N. Malod-Dognin
P. Di Lena
R. Andonov
R. Kolodny
R.H. Lathrop
Rumen Andonov
T. Havel
T. Kawabata
W. Xie
W.R. Taylor
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Protein structural alignment is an important problem in computational biology. In this paper, we present first successes on provably optimal pairwise alignment of protein inter-residue distance matrices, using the popular Dali scoring function. We introduce the structural alignment problem formally, which enables us to express a variety of scoring functions used in previous work as special cases in a unified framework. Further, we propose the first mathematical model for computing optimal structural alignments based on dense inter-residue distance matrices. We therefore reformulate the problem as a special graph problem and give a tight integer linear programming model. We then present algorithm engineering techniques to handle the huge integer linear programs of real-life distance matrix alignment problems. Applying these techniques, we can compute provably optimal Dali alignments for the very first time

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

Crossref

CWI's Institutional Repository

INRIA a CCSD electronic archive server

HAL-Rennes 1

Systematic permutation testing in GWAS pathway analyses: identification of genetic networks in dilated cardiomyopathy and ulcerative colitis

Author: Backes Christina
Franke Andre
Frese Karen
Haas Jan
Katus Hugo
Keller Andreas
Kloos Wanda
Lenhof Hans-Peter
Lieb Wolfgang
Meder Benjamin
Meese Eckart
Rühle Frank
Stoll Monika
Weis Tanja
Wichmann H.-Erich
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Background: Genome wide association studies (GWAS) are applied to identify genetic loci, which are associated with complex traits and human diseases. Analogous to the evolution of gene expression analyses, pathway analyses have emerged as important tools to uncover functional networks of genome-wide association data. Usually, pathway analyses combine statistical methods with a priori available biological knowledge. To determine significance thresholds for associated pathways, correction for multiple testing and over-representation permutation testing is applied. Results: We systematically investigated the impact of three different permutation test approaches for over-representation analysis to detect false positive pathway candidates and evaluate them on genome-wide association data of Dilated Cardiomyopathy (DCM) and Ulcerative Colitis (UC). Our results provide evidence that the gold standard - permuting the case–control status – effectively improves specificity of GWAS pathway analysis. Although permutation of SNPs does not maintain linkage disequilibrium (LD), these permutations represent an alternative for GWAS data when case–control permutations are not possible. Gene permutations, however, did not add significantly to the specificity. Finally, we provide estimates on the required number of permutations for the investigated approaches. Conclusions: To discover potential false positive functional pathway candidates and to support the results from standard statistical tests such as the Hypergeometric test, permutation tests of case control data should be carried out. The most reasonable alternative was case–control permutation, if this is not possible, SNP permutations may be carried out. Our study also demonstrates that significance values converge rapidly with an increasing number of permutations. By applying the described statistical framework we were able to discover axon guidance, focal adhesion and calcium signaling as important DCM-related pathways and Intestinal immune network for IgA production as most significant UC pathway

Crossref

Springer

Springer - Publisher Connector

Heidelberger Dokumentenserver

Open Access LMU

PubMed Central

PuSH

Louse (Insecta : Phthiraptera) mitochondrial 12S rRNA secondary structure is highly variable

Author: Billoud B.
Collins L.J.
Corpet F.
Critchlow D.E.
Day W.H.E.
Fontana W.
Gutell R.R.
Hafner M.S.
Hafner M.S.
Hickson R.E.
Hickson R.E.
Hofacker I.L.
Houde P.
Johnson K.P.
Johnson K.P.
K. P. Johnson
Konings D.A.M.
Lenhof H.-P.
Lockhart P.J.
Mindell D.P.
Moran N.A.
Page R.D.M.
Page R.D.M.
Page R.D.M.
R. Cruickshank
R. D. M. Page
Shao R.
Simon C.
Springer M.S.
Stoye J.
Swofford D.L.
Wheeler W.C.
Publication venue: 'Wiley'
Publication date: 01/01/2002
Field of study

Lice are ectoparasitic insects hosted by birds and mammals. Mitochondrial 12S rRNA sequences obtained from lice show considerable length variation and are very difficult to align. We show that the louse 12S rRNA domain III secondary structure displays considerable variation compared to other insects, in both the shape and number of stems and loops. Phylogenetic trees constructed from tree edit distances between louse 12S rRNA structures do not closely resemble trees constructed from sequence data, suggesting that at least some of this structural variation has arisen independently in different louse lineages. Taken together with previous work on mitochondrial gene order and elevated rates of substitution in louse mitochondrial sequences, the structural variation in louse 12S rRNA confirms the highly distinctive nature of molecular evolution in these insects

CiteSeerX

Crossref

Enlighten

Computation of significance scores of unweighted Gene Set Enrichment Analyses

Author: A Subramanian
A Zanzoni
Andreas Keller
C Backes
C Backes
Christina Backes
E Rubin
H Hermjakob
H Lee
Hans-Peter Lenhof
J Küntzer
J Lamb
L Salwinski
M Kanehisa
M Krull
S Kim
S Peri
S Wachi
T Barrett
TGO Consortium
V Matys
V Mootha
Y Benjamini
Y Hochberg
Z Jiang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background Gene Set Enrichment Analysis (GSEA) is a computational method for the statistical evaluation of sorted lists of genes or proteins. Originally GSEA was developed for interpreting microarray gene expression data, but it can be applied to any sorted list of genes. Given the gene list and an arbitrary biological category, GSEA evaluates whether the genes of the considered category are randomly distributed or accumulated on top or bottom of the list. Usually, significance scores (p-values) of GSEA are computed by nonparametric permutation tests, a time consuming procedure that yields only estimates of the p-values. Results We present a novel dynamic programming algorithm for calculating exact significance values of unweighted Gene Set Enrichment Analyses. Our algorithm avoids typical problems of nonparametric permutation tests, as varying findings in different runs caused by the random sampling procedure. Another advantage of the presented dynamic programming algorithm is its runtime and memory efficiency. To test our algorithm, we applied it not only to simulated data sets, but additionally evaluated expression profiles of squamous cell lung cancer tissue and autologous unaffected tissue.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Clinical predictors of long-term survival in newly diagnosed transplant eligible multiple myeloma - an IMWG Research Project

Author: Attal M. (Michele)
Barlogie B. (Bart)
Cavo M. (Michele)
Durie B. (B.)
Goldschimdt H. (Hartmut)
Hajek R. (R.)
Hoering A. (Antje)
Kumar S. (Shaji)
Lahuerta J.J. (Juan José)
Lee J.H. (Jae Hoon)
Lenhof S. (Stig)
Moreau P. (Philippe)
Morgan G.J. (Gareth J.)
Rajkumar S.V. (S. Vincent)
San-Miguel J.F. (Jesús F.)
Turesson I. (Ingemar)
Usmani S.Z. (Saad Z.)
Publication venue: Springer Nature
Publication date: 01/01/2018
Field of study

Purpose: multiple myeloma is considered an incurable hematologic cancer but a subset of patients can achieve long-term remissions and survival. The present study examines the clinical features of long-term survival as it correlates to depth of disease response. Patients & Methods: this was a multi-institutional, international, retrospective analysis of high-dose melphalan-autologous stem cell transplant (HDM-ASCT) eligible MM patients included in clinical trials. Clinical variable and survival data were collected from 7291 MM patients from Czech Republic, France, Germany, Italy, Korea, Spain, the Nordic Myeloma Study Group and the United States. Kaplan–Meier curves were used to assess progression-free survival (PFS) and overall survival (OS). Relative survival (RS) and statistical cure fractions (CF) were computed for all patients with available data. Results: achieving CR at 1 year was associated with superior PFS (median PFS 3.3 years vs. 2.6 years, p < 0.0001) as well as OS (median OS 8.5 years vs. 6.3 years, p < 0.0001). Clinical variables at diagnosis associated with 5-year survival and 10-year survival were compared with those associated with 2-year death. In multivariate analysis, age over 65 years (OR 1.87, p = 0.002), IgA Isotype (OR 1.53, p = 0.004), low albumin < 3.5 g/dL (OR = 1.36, p = 0.023), elevated beta 2 microglobulin ≥ 3.5 mg/dL (OR 1.86, p < 0.001), serum creatinine levels ≥ 2 mg/dL (OR 1.77, p = 0.005), hemoglobin levels < 10 g/dL (OR 1.55, p = 0.003), and platelet count < 150k/μL (OR 2.26, p < 0.001) appeared to be negatively associated with 10-year survival. The relative survival for the cohort was ~0.9, and the statistical cure fraction was 14.3%. Conclusions: these data identify CR as an important predictor of long-term survival for HDM-ASCT eligible MM patients. They also identify clinical variables reflective of higher disease burden as poor prognostic markers for long-term survival

Universidad de Navarra

Dadun, University of Navarra